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1. Summary and discussion 

Consider the self-normalized sum 

T 



S_ 

V 



where 



and X\, . . . , X n are independent zero-mean random variables (r.v.'s). It is as- 
sumed that T = on the event {V = 0}. For any p £ (0, oo), introduce also 



P p :=J2E\Xi\ P and /3 p := £ E |X 4 2 - E X. 
l l 

assuming that < j3s < oo (and hence < /?2 < oo). 
Let $ be the standard normal distribution function. 



2m/2 



Theorem 1.1. One has 



- +A 4 ^ + A 6 -% (1.1) 



/or oil z £ R and /or aZ/ triples t := (A3, A±, Aq) of absolute constants belonging 
to the set T := {ti, . . . , T4} 0/ triples, where 

ri := (1.61,1.60,1.20), 

r 2 := (2.01,1.02,0.61), 

t 3 := (11.38, 11.02, 11.78 x 10~ 6 ), 

r 4 := (1.34, 125377, 1.049 x 10 6 ). 

The triple t\ = (1.61, 1.60, 1.20) of the constant factors A3, A4, Aq was ob- 
tained trying to minimize the maximum A3V A4V Aq of the constants; for details, 
see the proof (in Section 2) of Theorem 1.1 and especially the table at the end 
of that proof. The triple T3 was obtained trying to minimize the effect of the 
6th-order moments of the X^s. The triple T4 was designed to work best when 
/3i and /?6 are very small, that is, when the distribution of each Xi is close to 
the symmetric distribution on a symmetric two-point set. The triples T2,T3,T4 
will be used in this paper to compare the upper bound in (1.1) with one due to 
Shao [16]. 

In the i.i.d. case, that is, when the r.v.'s X\, . . . ,X n are independent copies 
of a r.v. X, one can improve the values ^3,^4, Aq of the absolute constants in 
(1.1); at that, let us assume without loss of generality that 

EX 2 = 1. 
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Introduce 



P3 



ELY I 



Pi 



E|X 2 - 1| 
EIXP 



(1.2) 



Theorem 1.2. If X,X\, . . . ,X n are i.i.d. r.v.'s with EX = 0, EX 2 = 1, anc 

E \X\ 3 < oo, then 



| P(T < z) - $0)| ^ 



^-3P3 + -44/04 + 



(1.3) 



/or all z S R and for all triples t := (A3, A4, Aq) of absolute constants belonging 





of triples, where 






(1.53,1.52,1.34), 


n,2 




(1.61,1.60,1.02), 


T2,2 




(1.96,1.02,0.52), 


^2.1,1 




(1.96,0.99,0.63), 


^3,1 




(10.94,9.40, 11.06 x 10~ 6 ) 


^4,1 




(1.25,8140,92437); 



here, for each i = 1,2,3,4, the triples fi t j are to be compared with the triple Ti 
in Theorem 1.1, with the same i. 



For n > 2, the Student statistic 
t := 



X^i 



-Eiix.-xy 



where X := — Xj, can be expressed as a monotonic transformation of the 
self- normalized sum T: 



t 



Therefore, one immediately has 



n- 1 



T 



y/1 -T 2 /n 



(1.4) 



Corollary 1.3. Theorems 1.1 and 1.2 hold if P(T ^ z) — $(z) is replaced there 
by P(t ^ z) ~ $„(z), where 



$„(z) :=$ 



Vl + O 2 -!)/' 



(1.5) 



A Berry-Esseen type of bound of the optimal order for the Student statistic 
of i.i.d. Xi's was obtained in 1996 by Bentkus and Gotze [2], using a Fourier 
transformation method. This was extended to the non-i.i.d. case by Bentkus, 
Bloznelis, and Gotze [1], whose result can be rewritten as follows: 



I P(t< z) - $( z < C 2l2 + C373, 



(1.6) 
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where C2 and C3 arc absolute constants, 



12 



EX? !{,*,, >*fi}, 



1 



73 



(1.7) 

Note that t ~ T as 71 — > 00. The function $„, defined by (1.5), may be 
considered as an improper distribution function, with the "impropriety" 1 — 

($ n (oo) - $„(— 00)) = 2(l - $(- v /n)) ~ \f-^ e ~ n ^ 2 f° r l ar S c n i which is much 
less than -^=. If n is not very large, the tail probability 1 — Q n {z) may be 
much greater than 1 — $(z), which appears to correspond qualitatively to the 
fact that the tail of the Student distribution is significantly heavier than the 
standard normal tail when the number of degrees of freedom (d.f.) is not large. 
This heuristics appears to be confirmed by Figure 1, for n = 10; the pictures 
for n — 5 and n — 20 look quite similarly. 





Fig 1. Logarithms of the ratios of the tail functions 1 — <&(•) (red), 1 — <E>( ■ w n " 1 ) (blue), 
and 1 — "3?n(-) (green) to the tail function of the Student distribution with n — 1 d./. 



It appears that on the interval [1.5, 00) the tail function 1 — $„(■) is closer to that 

of the Student distribution than the tail functions 1 — $(•) and 1 — <t>( • ^zj) 

are. So, while the method of the proof (given in Section 2) appears to allow 
one to obtain analogs of Theorems 1.1 and 1.2 for P(t ^ z) — §(z) in place of 
P(T ^ z) — $(z) or P(t ^ z) — ^n(^); such analogs will not be pursued here. 

Anyway, the following proposition shows that & n (z) differs from &(z) by 
much less than 1/y/n, uniformly in z € R. 



Proposition 1.4. For all n > 1 and z S 



|*(*) - *„(*)! < 



c 



n- 1' 



0.162. 



where 



and k :— 1 + 



(1.8) 



fftis constant factor, C, is the best possible in (1.8). 
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One may be concerned that it is more natural to compare the distribution 
function of the statistic t (as in (1.4), for general zero-mean XiSj, not with $ 
or $„, but with the distribution function (say -F^-i) of Student's distribution 
with n — 1 d.f. — that is, with the distribution function of the statistic t for 
i.i.d. standard normal X^s. However, as shown in [10], 

|F n _i(z)-$(z)| < : 7 with C = 0.158... 

n — 1 

for all n ^ 5 and z£l. Therefore and in view of Proposition 1.4, F„_i(z) differs 
from $„(z) by much less than \/y/n, uniformly in z£l. Thus, Corollary 1.3 is 
quite relevant, notwithstanding the mentioned concern. 

In the i.i.d. case, Nagaev [6, (1.18)] stated an inequality, which reads as follows 
(in the conditions of Theorem 1.2): for all z € K 

| P(T < z) - $(z)\ < (4.4 E LY| 3 + Ml + E \X 2 - 1| 3 ) -L. (1.9) 

However, there are a number of mistakes in the proof of (1.9) in [6]. It is also 
stated in [6], again in the i.i.d. case, that 

|P ( r< a )-^)|<HiW±9. 

V n 

Using Stein's method, Shao [16] obtained a tighter and more general bound, 
also with explicit constants but without the i.i.d. assumption: 

| P(T | s$ 10.2 72 + 2573 (1.10) 

< 25/V/3 2 p/2 

for all p G [2,3], with the same 72 and 73 as in (1.7). More recently, a Berry- 
Esseen bound for T was obtained in [3] for i.i.d. standard normal X^s by means 
of Malliavin calculus. 

Let us compare the bounds in (1.1) and (1.10). At that, let us restrict the 
attention to i.i.d. r.v.'s X, Xi, . . . , X n . 

Consider first the case when X has a two-point zero-mean distribution, so that 

P(X <G {—a, 6}) = 1 for some positive real numbers a and b; that is, 

P(X = b) = — ^— = 1 - P(X = -a). 
y ' a+b v ; 

This case appears especially interesting, as any zero-mean distribution can be 
represented as a mixture of two-point zero-mean distributions — see e.g. [15]. 
Without loss of generality, assume that b ^ a and ab = 1. Then 6^1 and 
EX 2 = 1, and hence the bound in (1.1) (with the triple r = T3 of constants 
A 3 ,A 4 , A 6 ) is no greater than (11.38p 3 + 11.02p 4 + 11.78 x 10 _6 p 6 )/%/«, where 
again the pj's are as in (1.2), so that p% = j^rpjy, Pi = b — 1/6, and pa = 
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(6 — On the other hand, if b > \/n/2, then the bound in (1.10) is no 

less than 10.2 fc+ ^ 6 ^ 5.1 > 1. So, without loss of generality b ^ \fnj2 and 

hence the bound in (1.10) equals 25ps/s/n. Thus (preferably with the help of 
the Mathematica command Reduce or similar tools), one finds that the bound 
in (1.10) will be less than the bound in (1.1) only if b > 469, that is, only if the 
"asymmetry index" b/a is greater than 469 2 = 219961; at that, the inequality 
b ^ y/ri/2 implies that n must be no less that (2b) 2 > (2 x 469) 2 = 879844. 
One concludes that, for i.i.d. X^s with a common two-point distribution, (1.1) 
is better than (1.10) unless both the sample size n and the asymmetry index 
are very large. Also, in the "symmetric" case when b = a = 1, the bound in 
(1.1) (with r = T4) reduces to 1.34/y/n, which is > 18 times as small as the 
bound in (1.10) (for n ^ (2b) 2 = 4). 

While the two-point distributions may be of particular interest, they are of a 
bounded support set, and hence all their moments are finite. On the other hand, 
one may object that the bounds given in Theorems 1.1 and 1.2 will be infinite 
and hence useless if the 4th-order moments of the X^s are infinite. However, 
this concern is easily addressed via truncation. 

For a minute, let X denote any zero-mean r.v. If the distribution of X is 
continuous, then for each b £ [0, 00] there is some a £ [0, 00] such that the r.v. 
X a b := ll{-a < X < b} is zero-mean; the same holds in the case when the 
distribution of X is symmetric (about 0) — then one can simply take a = b. 
If the zero-mean distribution of X is not continuous or symmetric, one can 
use randomization, say as in [15], to still find, for each b £ [0, 00], some a £ 
[0, 00] and some zero-mean r.v. X a ' b such that P(— a ^ X a ' h ^ b) = 1 and 
X a ' h = X on the event {—a < X < b}; let us refer to any such r.v. X a '° as 
a zero-mean truncation of the zero-mean r.v. X . (One could similarly base an 
appropriate construction on the so-called Winsorization (—a) V (X A b) instead 
of the truncation X I{— a < X < &}.) 

Now let X-y, . . . ,X n be zero-mean r.v.'s as in Theorem 1.1 or 1.2. Respec- 
tively, let B(X\, . . . , X n ) denote (for any of the triples ti, . . . , t±, fi i, . . . , T4 1), 
either one of the bounds in (1.1) or (1.3), as it depends on (the individual dis- 
tributions of) the Xj's. So, B(Xi, . . . , X n ) denotes the bound in (1.1) under 
the conditions of Theorem 1.1, and it denotes the bound in (1.3) under the 
conditions of Theorem 1.2. The following corollary of Theorems 1.1 and 1.2 is 
immediate: 

Corollary 1.5. Under the conditions of Theorem 1.1 or 1.2, for each i £ 
{1, . . . , n} let X?*' ' be a zero-mean truncation of X{. Then for all z 6K 

n 

I P(T < z) - $(*)| < P ((J{X, i (-at, bi)}) + B(Xr M ,. ■ .,X^ b "). (1.11) 

1 

Note that the upper bound in (1.11) can be expressed only in terms of the 
individual distributions of the X^s (rather than their joint distribution), since 

n n 

1 1 
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So, when the bound in (1.1), (1.3), (1.6), or (1.10) can be computed, usually 
the "truncated" bound in (1.11) can be computed as well. 

One may want to compare the bound in (1.11) with that in (1.10) or even 
with the "truncated" version of the latter bound: 

n 

i 

where 72 and 73 are obtained from 72 and 73 by replacing the X^s with their 
zero-mean truncations X?*' as in Corollary 1.5. 

Let us make such a comparison when the X^s are i.i.d. with a common 
distribution, which is either the Student distribution with d > degrees of 
freedom or the (centered) Pareto distribution with the density 

/.(*) _i I )- l l{ a! >_-!-}, 

where s is a parameter with values in the interval (l,oo). Clearly, Student's 
distribution with d degrees of freedom is symmetric, with heavy tails for small 
d and light ones for large d, whereas the Pareto distribution with parameter 
s is highly skewed to the right, with a heavy right tail for small s > 1 and a 
light one for large s. In keeping with the "i.i.d." assumption, let us consider 
the "truncated" bounds in (1.11) and (1.12) with b\ = ■ ■ ■ = b n =: b and, 
accordingly, a\ = ■ ■ ■ = a n =: a; note that in each of the two cases under 
consideration (Student's or Pareto's), the value of a is uniquely determined 
by that of b. Then, moreover, let us (numerically) minimize the "truncated" 
bounds in b. The results are shown in Figures 2 and 3. There, the graphs are 
shown: of the bound in (1.10) (blue), of the minimized "truncated" bound (1.12) 
(magenta), of the bound in (1.1) (red), and of the minimized "truncated" bound 
in (1.11) (green) — for sample sizes n € {10,100,1000,10000}, d € [2.5,20], 
and s G [3.5,20]; at that, for the "red" and "green" bounds the triple T2 = 
(2.01, 1.02,0.61) of constant factors in (1.1) is used. 
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Fig 2. The bounds in the case of Student's distribution with d degrees of freedom. 




Fig 3. The bounds in the case of Pareto's distribution with parameter s. 



These pictures suggest the following. 

1. Predictably, truncation helps significantly only when the tails are heavy 
enough — that is, for small enough values of the parameters d and s. 
Predictably as well, truncation is much more useful with the bound in 



imsart-generic ver. 2009/05/21 file: arxiv2.tex date: May 24, 2012 



Iosif Pinelis/ 'Berry-Esseen for Student 



9 



(1.1) than it is with that in (1.10). 

2. For Student's and Pareto's distributions, even the minimized "truncated" 
bound in (1.12) is nontrivial (that is, less than 1) only if n is greater 
than 1000 (or even a few thousands). In fact, this bound is not much less 
than 0.5 even for n = 10000 and light tails. For instance, for n = 10000 
and Student's distribution with d = 20 d.f., the bound in (1.10) and 
the minimized bound in (1.12) are both sa 0.417, whereas the bound in 
(1.1) and the minimized bound in (1.11) are both pa 0.068 (again, with 
r = t 2 = (2.01,1.02,0.61)). 

3. Figure 3, for the Pareto case, as as well as other considerations (see e.g. 
[14, 15] and discussion therein) suggest that the Student statistic may not 
be appropriate for statistical inference when the underlying distribution 
is significantly skewed. Alternative statistics, "correcting" for the asym- 
metry, were offered and considered; see [14, 15] and discussion therein. 

4. If the tails are very heavy, then even the minimized "truncated" , "green" 
bound in (1.11) is not much less than 1 even if n is as large as 1000 and the 
underlying distribution is symmetric. This may be in broadly considered 
agreement with the fact, established in [5], that if the the underlying 
distribution is in the domain of attraction of a stable law with index a < 2, 
then the limit distribution of the self-normalized sum and, equivalcntly, 
that of the Student statistic is not normal. 

5. For almost all considered values of n, d, and s, the minimized "truncated" 
bound in (1.11) is significantly less than that in (1.12), except in the Pareto 
case with n = 10000 for a rather short interval of values s near 7, where, 
however, even the better bound is only slightly less than 1. Conceivably, 
this deficiency might be fixed by using another triple of constants in place 
of the triple = (2.01,1.02,0.61). Moreover, when the tails are light 
enough, even the "non-truncated" bound in (1.1) significantly improves 
both on the "truncated" and "non-truncated" bounds in (1.12) and (1.10). 
Thus, especially with the truncation tool, getting smaller constant factors 
may be more effective than insisting on the optimal order of moments even 
for the price of much greater constants. 

It appears that, with the much smaller constant factors than in the preceding 
results, the bounds presented above may be approaching the state of being of 
use in statistical practice. There arc additional resources to be tapped on. For 
instance, the proofs of Theorems 1.1 and 1.2 rely to a large extent on a hybrid 
between the Chebyshev and Cantelli bounds, developed in [9] specifically for the 
purposes of the present paper. One can similarly try to use and/or develop the 
much more accurate (but also much more complicated) upper bounds on large 
deviation probabilities given and discussed in [11]; however, at that the proofs 
can be expected to be much harder to produce or read. 
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2. Proofs 

Proof of Proposition 1.4. Introduce A(a, z) := $(it a)Z ), where a G (0,1) and 
Ua > z := TrHh 7? ' so that = A (^ z ) and ®( z ) = A(0,«). By the mean 

yl+a(2 —1) 

value theorem, for some b = b z £ (0, a) 

A(a,z) - A(0,z) <9A,, . u b ,z0. - v% z )<p(ub, z ) 

= TT^ Z > = on ' m ' 

a aa 2(1 — 0) 

where ip is the standard normal density function. So, to prove inequality (1.8), 
it suffices to note that sup„ 6K |u(l — u 2 ){p(u)\ = 2C and < 737 = f° r 
b G (0, a) and a = —. That the constant factor C is the best possible in (1.8) 
follows because, by l'Hospital's rule, A ( a ^)~ A ( - z ) ^ 75^ (a, z) as a 10. □ 

The proof of Theorem 1.1 is based, in part, on the following two lemmas. 

Lemma 2.1. Take any X, r* } a, b in (0,oo). Take any c and r in (0, 00) such 
that 

A 

c — ana r ^ r*. 
r 



(2.1) 



Lei Y" by any r.v. such that EY = ana! cr := \JEY 2 G (0, 00). Then 

. \ / ^\ / \ u Av 

r(Y ^ c) ^ V r *j - ) r i where ip(u, v) := — - — r^. 

Proof of Lemma 2.1. By the condition c — and Cantclli's inequality, 
P(y > c) < P(y ^ 7) ^ ga+ g /r) , = pg^, where U: =A/cr. 

Note that pjp^ increases in r G [0, v] and decreases in r G [v, 00). So, if 7% ^ d, 
then the condition r ^ implies ? , 2 +. u 2 ^ 72+72 = VK'"*) w )- If now r * ^ v > then 
r 2 T^a ^ ^2+^2 = ip(r*,v), so that the inequality in (2.1) holds in this case as 
well. As for inequality (2.2), it is given in [9]. □ 

Lemma 2.2. For any positive real numbers x, x\, X2 such that x x\ V ' x%, one 
has 

x^{x) ^T{x 2 ), (2.3) 

where 

$ := 1 - 

I>*(:r) := 0.17I{0 < x < 0.752} + x$(x) l{x > 0.752}. 
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Proof of Lemma 2.2. It is well-known that the function $ is log-concave; see e.g. 
[4, 12]. So, the function L defined on (0, oo) by the formula L(x) := In (x$(x)) 
is concave, and hence L(x) ^ L(xq) + L' (xq)(x — xq) for any x and xq in (0,oo). 
Also, L'(0.751) > > L'(0.752) and £(0.752) + L'(0.752)(0.751 - 0.752) < 
In 0.17. This implies that L < In 0.17 on (0, oo) and L is decreasing on (0.752, oo), 
whence sup{x<I>(a;) : x ^ z} ^ $ (z) for all z <G (0, oo). Now the lemma follows. 

□ 

Proof of Theorem 1.1. This proof uses some of the ideas in the proof of (1.9) 
in [6], which were previously presented in [7, 8]. As mentioned before, there 
are a number of mistakes of various kinds in the proof in [6]. For instance (in 
the notations of [6]), a bound on l^(y=~jfy) — $( ( 1 ^) r )\ analogous to that 
on |j>( ^-T° ^ ) — $(-2=)| in [6, (1.12)] is missing there; moreover, the same 
bound in [6, (1.12)] must have ( a ^ A l) 2 instead of ( g ^ A 1). We have also 
produced and utilized some new ideas in this proof. One of them is presented 
in Lemma 2.1 above, which depends on the result of [9], specifically developed 
for the purposes of the present paper. 
Without loss of generality, assume that 



Take any 



/8a = 1. 



«e(o,oo), e 4 e(o,|), £ 3 e(o,oo), e 2 e(o,i), 



(2.4) 



and introduce 



and also 



? 3 e(0,l), 4 e(O,oo) 

= A(z):=P(T^)-$(z) (2.5) 



?'3 := 03, U := P\ /2 , r e := (2.6) 
e := KT4, £4 := — . (2.7) 



K 

It suffices to show that 

|A| = |A(z)| s$ A 3 r 3 + A 4 r 4 + A 6 r 6 , 
where without loss of generality let us assume that 

z > 0. 

Consider the following three cases. 

Case 1 ("small n"): e £4 or r 3 ^ £3. Note that 

£ £4 r± ^ £4. 
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So, 

|A| < 1 <(A 3 ,i r 3 ) V (At.1 r 4 ) V (A 6>1 r 6 ) 

<^3,i t- 3 + ^4,1 r 4 + A 6 ,i r 6 , where (2.8) 

A 3 ,i := — , A 4 ,i := -!-, A 6 ,i := 0. 

Case 2 ("large n" & "large deviations"): e<£ 4 Sr 3 <£ 3 gz) |a|. 
Then, by (2.5) and (2.1), 

\A\^(P 1 +P 2 )V$(z), 

where 

Pi := P(T > z,V > l- £2 ) 

< P(S > (l-e 2 » < {i)(e 3 ,6 3 )r 3 ) V (^(£4, <9 4 )r 4 ), 
ftj := {l-e 2 )6 3 , 

n 

P 2 := P (V «S 1 - e 2 ) = P (E( EX ^ - X > ) > < V(£4,£ 2 )r4, 

1 

£2 := £2(2 - £ 2 )- 



Note also that the currently assumed case conditions e < e 4 & r 3 < e 3 
z > ^- A — imply z > 2a > la or 2 > & > |i. So, Lemma 2.2 yields 



Thus, 



*(*)<Fn|)£|v [*•(£)£]. 

I A| ^A 3 . 2 r 3 + At )2 r 4 + A 6i2 r 6 , where (2.9) 
A 3 , 2 :=# 3 ,e 3 )v[r(|)I], 
A 4 , 2 := [^(£4,04) +^(£4, e 2 )] V [T(|) £], 



Ago := 0. 



*6,2 

Case 3 ("large n" & "moderate deviations"): £ < £4 & r 3 < e 3 & z < A . 
In this case, note that 



{T < z} = {T z ^ z}, where T z := S - z(y/l + 77 - l), and 77 := V 2 - 1. 

Note also that the expression S — z(^/T+ rj — l) for T z is convex in (S, 77), so 
that its linear approximation (at the point (ES", E 77) = (0,0)) 

S,:=S- Z77/2 

never exceeds T Zl whence 

T z S z 



5 := = 1 + j - V^+V > 0- 
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P (S z s$ (1 - e)z) - P(<i > e) s$ P(T z) sC P(S 2 < z). 
In view of (2.5), it follows that 

AsCBE + D(l) and 
-A^BE+P(£> e) + D(l - e) + D e , where 



13 



BE := sup 



(11 
— 

n 



2 

i.z' 



Thus, 

Note also that 



D(u) := $( U z)-$( — 



D e :=sup [$(x)-$((l-e)x)]. 



A| ^ BE + P(J > e) + D(l) V D(l - e) + D £ 



S z = J~]X iiZ , where 
i 

X M := X, - z^/2 and Y % := Xf - El, 2 , whence 

n 

i 

By a recent result of Shevtsova [17], 



BE sC 0.56 



3.- 



ft 



(l-a)s 



(2.10) 
(2.11) 

(2.12) 



(2.13) 



where 

1 1 

for any 

ae (0,1); 

the second inequality in (2.14) follows from the elementary inequality (a + b) 3 
(i ° a )2 + ^2 for all a and 6 in [0, oo) and a € (0, 1). Recalling also the condition 
z < 77 A 77 an( i definitions (2.6), one has 

1 



(2.15) 



(l-«) : 



#3 

8cr 



(2.16) 
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Next, 



ol = ]T EX?, = 1 + (|) 2 A - zJ2 EX? ^ 1 - zft > 1 



(2.17) 



So, (2.13) and (2.16) yield 



BE s$ 



0.56 



(i - e 3 yn- V(i - a ) 



8a 2 



(2.18) 



Further, since < e < £ 4 < |, one has 8 > e 77 <^ [2e-2\/2e, 2e+2\/2e]. 

So, by (2.2), (2.6), and (2.7), 



p( , >£)< 4ri + 16^_l + 4^ 



32e 



8k 



r 4 . 



Next, by (2.10), for any u g [0, 00), 
D(it) ^ uz 



<T Z V 1 



(2.19) 



(2.20) 



where tp is the standard normal density function. By the equalities in (2.17) and 

e 4 



the case conditions e < £4 and z < — A — , 



W - IK *A + (f )ft = -3 + (£)%* < zr 3 + (|) : 



e 4 r 4 



and 



(2.21) 



|a 2 -lKzr 3 



< ^3 + 0l/4. 



Writing | - 1| = and using (2.20) and (2.21), one has 



£>(u) ^ Dx(u) + D 2 {u), where 

n / \ v 2 <p{v) . . e 4 rV(t>) 
L>i(w):=r 3 — - — p 2 , L> 2 (w):=r 4 - — — p 3 , 



If a z ^ 1, then by (2.17) for j = 2,3 

1 



4 u 2 

V 1)3 



CJ 2 + cr; 



ft = 



cr 2 +(TJ 

If er 2 > 1, then by (2.22) for j = 2, 3 
0-3 1 



a/1 



ft 



C 2 + cr, cr 



-i-J _L 2-2 ^ ft*^' : 1-j , 2-2 



(7* + <X* 



(2.22) 



imsart-generic ver. 2009/05/21 file: arxiv2.tex date: May 24, 2012 



Iosif Pinelis/ 'Berry-Esseen for Student 



15 



where 

ct* := ^1 + 03 + 01/4. 

Note also that 

supir<p(w) = Sj := 



for j = 2, 3. Therefore, recalling also the condition e < £4, one has 
D(l) V D(l - e) < r 3 (p* V p**, 2 ) + r 4 *± * 3 (p, V p M>3 ). (2.23) 

1 — £4 4(1 — £4j z 

Next, let us estimate D e . First here, one can use a special-case l'Hospital- 
type rule for monotonicity, such as [13, Proposition 4.1], to see that for each 
x 6 (0,oo) the ratio increases in f e (0,1). On the other hand, 

for each t £ (0,1) the expression $(x) — $((1 — t)xj attains its maximum in 
x £ (0, 00) at x = Xt, where 



21n(l-i) 
t(2-t) 



On recalling also the definition (2.11) of D £ and the conditions < £ < £4 < |, 
it follows that 

n ^ vr \ nt \ u m \ $(x ei )-$((l-£ 4 )x e4 ) 

Z? e < it(£4)£ = i?(£ 4 )Kr 4 , where it (£4) := -. (2.24) 

£4 

Collecting (2.12), (2.18), (2.19), (2.23), and (2.24), one bounds |A| in Case 3 
as follows: 

|A| ^A 3 , 3 r 3 + A 4 . 3 r 4 + A 6 , 3 r 6 , where (2.25) 
0.56 s 2 (p*V p**a) 



A 



3,3 



(1 -6*3)3/2(1 -a) 2 l-£ 4 



-44,3 := — 5 + — 77; T2 — + R{£i)K, 

SK 4(1 — £4) z 



-4 



0.07/ 01 \3/2 



6,3 



a 2 VI 



Collecting now the bounds (2.8), (2.9), and (2.25) on |A| in Cases 1-3, one 
concludes that in all of the three cases 

|A| ^A 3 r 3 + A A r A + A 6 r 6 , where (2.26) 
A p := A pA V A p - 2 V A Pi3 

for p = 3, 4, 6. 

Now one can arbitrarily select positive "weights" W3,W4,u;6 and then try 
numerical minimization of (say) (w 3 A 3 ) V (1V4A4) V (weA e ) with respect to all 
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the parameters: a, £4, £3, £2, K, 63, 04, within their specified ranges — recall 
(2.4) and (2.15). The target function here appears to have a great number of 
local minima, and so, it is hardly possible to find the global minimum. Even 
though the numerical minimization is imperfect, it should be clear that the 
bound in (2.26) holds for all the allowable values of the parameters as specified 
in (2.4). The following table shows the values of the parameters a, £4, £3, £2, 
k, 03, 64 found by the mentioned numerical minimization for each of a few 
selected triples (wa, u>4, we), as well as the resulting triple 73 of the coefficients 
(A3, A4, Aq), corresponding to the so obtained values of the parameters. 



w 3 


W4 


w 6 


a 


£4 


£3 


£2 


K 


3 


04 


triple 


1 


1 


1 


2 


123 


2703 


22 


43 


377 


5407 


n 


25 


To 3 


To 3 " 


125 


250 


To 3 " 


To 3 " 


1 


2 


1 


27 
200 


363 
10 3 


1401 
10 3 


19 
50 


91 
250 


413 
10 3 


3167 
10 3 


7-2 


1 


1 


10 6 


381 
500 


471 

To 3 


6927 

To 3 " 


23 
To 3 


79 
50 


9 
200 


3809 

To 3 " 


73 


1 


If)" 5 


10" 6 


8.39 

To 5 " 


3.17 

To 5 " 


1.32 


3.49 

To 5 " 


9.97 
10' 


0.3738 


2.69 


7"4 



Now Theorem 1.1 is completely proved. □ 



Proof of Theorem 1.2. This proof is quite similar to that of Theorem 1.1. The 
only essential difference that, instead of the constant 0.56 in (2.13) one can 
now use the better constant 0.4785, according to a recent result of Tyurin [18]. 
Because we cannot find the global minima, it sometimes turns out that the 
numerical minimization with the better constant 0.4785 produces results worse 
(or not quite better) than those obtained using the worse constant 0.56. (!) In 
such cases, we used the values of the parameters a, £4, £3, £2, k, 63, 64 found in 
the general, non-iid setting — with the worse constant 0.56 and with the same 
weights (W3, W4, w 6 ); the resulting triples are denoted as t\ 2 , with the second 
subscript 2. Otherwise, the triple's second subscript is 1, as in 71,1, 7^1, and 
t 4 1. See the table below. 



w 3 


W4 


w e 


a 


£4 


£3 


£2 


K 


03 


04 


triple 


1 


1 


1 


41 
500 


113 
500 


277 
100 


39 
200 


83 
500 


409 

To 3 " 


4467 

To 3 " 


r M 


1 


1 


1 


2 

25 


123 

To 3 


2703 

To 3 " 


22 

125 


43 
250 


377 

To 3 " 


5407 

To 3 " 


7%2 


1 


2 


1 


27 
200 


363 
To 3 


1401 
To 3 " 


19 

50 


91 
250 


413 
To 3 " 


3167 

To 3 " 


■?2,2 


1 


2.1 


1 


0.14 


275 

To 3 


6.7 


0.42 


0.27 


0.44 


3.2 


"72.1,1 


1 


1 


10 6 


777 
To 3 " 


1 

2 


1381 
500 


27 

To 3 


451 

To 3 " 


47 
To 3 


4569 
500 


^3,1 


1 


10~ 5 


10" 6 


3 

TcF 


43 

To 5 ' 


10.3 


13 

TcF 


3.5 


0.401 


1.6 


74,1 



For instance, one can see that the values of the parameters a, £4, £3, £2, k, 03, 
04 resulting in the triple fx,2 are the same those used to obtain the triple t\. 
Similarly, the values of the parameters for the triple T2,2 are the same those for 
the triple 72. □ 
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